Open Bug 1939561 (enus-dictionary) Opened 5 months ago Updated 2 days ago

Track new words and corrections to en-US dictionary

Categories

(Core :: Spelling Checker: en-US Dictionary, enhancement)

enhancement

Tracking

()

People

(Reporter: flod, Unassigned)

References

Details

Using this bug to track and discuss requests for new words in the Mozilla en-US dictionary.

  • Try to provide information on the terms you want to add, in particular references to external sources that confirm the usage of the term (e.g. Merriam-Webster or Oxford online dictionaries).
  • Include all possible forms, e.g. plural and genitive for nouns, different tenses for verbs.
  • Names of companies or people should not be included.

The goal is to add words that are commonly used, not all the words, as that might have a negative impact on performances (and will impact the installer size).

The list of missing words from the Wiktionary English dictionary is available at https://tdulcet.github.io/Missing-Words/ and automatically updated monthly. With the default options (only words without numbers or symbols and with a Wikipedia page), this currently includes 7,707 words for consideration. It can also be downloaded in TSV format. See Bug 1811451 comment 11 and below for more information. Since then, I have added several new options, including to sort the words, disable the normalization of words before checking if they are already in the dictionary and show words with one or more forms not in the dictionary. Feedback is welcome.

The missing words from the Ispell small and medium American English dictionaries are available in Bug 1811451 comment 2, which includes 13,997 words for consideration. I am always looking for high quality wordlists or other ideas about how we could systematically find those words that should be included in the Mozilla en-US dictionary.

TikTok (correctly capitalization of the most popular video app in the world).

YouTube is already present in the dictionary.

See first comment

Names of companies or people should not be included.

We should avoid adding more company names, because they don't last. Firefox's personal dictionaries can solve that.

Duplicate of this bug: 1944936

Bug 1550932: taekwondo

We can have a paywall but things can't be paywalled??

And we can have a leak but we can't have someone who does so, a leaker? (plus it's plural.)

"indicia" come on. this entire setup where a german native speaker controls the dictionary needs to end if you want us to pretend firefox is worth fixing

(In reply to Jack Laxson from comment #10)

"indicia" come on. this entire setup where a german native speaker controls the dictionary needs to end if you want us to pretend firefox is worth fixing

You might want to take a look at https://bugzilla.mozilla.org/page.cgi?id=etiquette.html

Flagging missing words in this bug is fine, the rest of the comment is completely unnecessary.

TypeScript: https://en.wiktionary.org/wiki/TypeScript

JavaScript/M (coming from mozilla-specific.txt(?)) is already there.

Adding TypeScript/M along existing and correct -- yet specialised and now historical -- typescript/MS would possibly save some confusion when referring to the programming language that is twelve years old now, yet is getting it's name marked as a spelling error. (Anecdotally, I had to double-check and consequently unlearn writing Typescript everywhere; I really thought it differs from JS in casing, only because I was blindly trusting my spellchecker.)


By the way, is maintaining these additions in the Firefox' code base (i.e., NOT through the "parent" SCOWL) really the optimal solution here? Or is SCOWL fetching additions from here? I see that JavaScript is included there already: http://app.aspell.net/lookup?dict=en_US;words=JavaScript so maybe there is no reason to keep it in "Mozilla specific" any more?

Advocacy.

We should avoid adding more company names, because they don't last.

I second that this sentiment seems to be potentially very harmful for Firefox user base. Just think for a while: how many users today, and in near future, will likely type TikTok in some spell-checked field?

Firefox's personal dictionaries can solve that.

So are we really expecting that our users will manually maintain all "new" terms, even those that has been constantly appearing in world news for several years now?

The current .dic file contains charming echoes of long defunct companies or products, some of which operated for only a short duration, yet are still included - which is a good thing, if you ask me. I'd bet that if we counted how many times tech journalists and enthusiasts typed words like GameCube, FrontPage, ColdFusion, CompuServe, ChatZilla, DivX, BlinkList, Macromedia, and Compaq combined in the past year, that sum would be smaller than the count for TikTok alone (however sad and bitter that reality may be).


Tangential off-topic:

[...] this entire setup where a german [sic] native speaker controls the dictionary needs to end if you want us to pretend firefox [sic] is worth fixing

Amusingly, I am mostly using British dictionary that is maintained by a lone Portuguese person, and from what I can tell, he is doing pretty good work.

runnable
https://www.merriam-webster.com/dictionary/runnable
https://en.wiktionary.org/wiki/runnable

Maybe consider unrunnable and nonrunnable, though I don't have a strong opinion about that – not much sources outside Wiktionary.

sequiturs, non-sequiturs

https://www.merriam-webster.com/dictionary/non%20sequiturs
https://dictionary.cambridge.org/us/dictionary/english/non-sequitur

Singular non sequitur and non-sequitur are marked as correct, just the plural forms are missing. Even sequitur by itself is also marked as correct even though it's rarer; I was unsure how fossil words (Wikipedia) that are archaic/obsolete/regional/non-English outside of set phrases, should be treated, given the impracticality of detecting whether they're in a valid phrase.

As for sequiturs then maybe also consider sequuntur (the proper Latin plural), and sequituri (improper but reportedly accepted (*)):

https://en.wiktionary.org/wiki/non_sequitur#:~:text=The%20legitimate,misformed.

By the way, the word misformed from there also does not pass the spell check.

(*) I see a parallel to "octopi" that is also wrong amalgamation of Greek root with Latin suffix, but accepted even by en-US dict. Nobody like hyper-correct octopodes, though.

I'm only gonna include one source, cause I've got a lot of entries. All of these come from my personal dictionary that I've added over the course of time that I've used Firefox.
conclusionary
https://www.merriam-webster.com/dictionary/conclusionary
eyewear
https://www.merriam-webster.com/dictionary/eyewear
headwear
https://www.merriam-webster.com/dictionary/headwear
footwear
https://www.merriam-webster.com/dictionary/footwear
pareidolia
https://www.merriam-webster.com/dictionary/pareidolia
overengineer, overengineered, overengineering,
https://www.merriam-webster.com/dictionary/overengineer
overgeneralization - implied by overgeneralize
https://www.merriam-webster.com/dictionary/overgeneralization
functionalize, functionalized, functionalizing, functionalizes
https://www.merriam-webster.com/dictionary/functionalize
luminance
https://www.merriam-webster.com/dictionary/luminance
chrominance
https://www.merriam-webster.com/dictionary/chrominance
degauss, degaussed, degausser, degaussers
https://www.merriam-webster.com/dictionary/degauss
chroma
https://www.merriam-webster.com/dictionary/chroma
liminal, liminalities, liminality
https://www.merriam-webster.com/dictionary/liminal
countershading
https://www.merriam-webster.com/dictionary/countershading
refried
https://www.merriam-webster.com/dictionary/refried
memetic
https://www.merriam-webster.com/dictionary/memetic
sclera - this one is surprising
https://www.merriam-webster.com/dictionary/sclera
deplatform, deplatformed, deplatforms, deplatforming
https://www.merriam-webster.com/dictionary/deplatform
lagomorph, lagomorphs
https://www.merriam-webster.com/dictionary/lagomorph
faceplate, faceplates
https://www.merriam-webster.com/dictionary/faceplate

Also, while we're on the topic, can Firefox be the first to recognize that a capitalization error is a grammar mistake and not a spelling mistake?

stereotypical

sexually

unlikeable

authenticator

You need to log in before you can comment on or make changes to this bug.